AITopics | population loss

Collaborating Authors

population loss

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

33ffc1a4a06e19865ac6cba04c753121-Paper-Conference.pdf

Neural Information Processing SystemsJun-23-2026, 06:27:38 GMT

We study which machine learning algorithms have tight generalization bounds with respect to a given collection of population distributions. Our results build on and extend the recent work of Gastpar et al. (2024). First, we present conditions that preclude the existence of tight generalization bounds. Specifically, we show that algorithms that have certain inductive biases that cause them to be unstable do not admit tight generalization bounds. Next, we show that algorithms that are sufficiently loss-stable do have tight generalization bounds. We conclude with a simple characterization that relates the existence of tight generalization bounds to the conditional variance of the algorithm's loss.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.92)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks

Neural Information Processing SystemsJun-15-2026, 11:42:25 GMT

We study the dynamics of stochastic gradient descent (SGD) for a class of sequence models termed Sequence Single-Index (SSI) models, where the target depends on a single direction in input space applied to a sequence of tokens. This setting generalizes classical single-index models to the sequential domain, encompassing simplified one-layer attention architectures. We derive a closed-form expression for the population loss in terms of a pair of sufficient statistics capturing semantic and positional alignment, and characterize the induced high-dimensional SGD dynamics for these coordinates. Our analysis reveals two distinct training phases: escape from uninformative initialization and alignment with the target subspace, and demonstrates how the sequence length and positional encoding influence convergence speed and learning trajectories. These results provide a rigorous and interpretable foundation for understanding how sequential structure in data can be beneficial for learning with attention-based models. Stochastic Gradient Descent (SGD) is the core optimization tool driving modern machine learning. Recent years have seen substantial progress in understanding its dynamics, particularly in two-layer networks [Saad and Solla, 1995, Mei et al., 2018, Chizat and Bach, 2018, Rotskoff and VandenEijnden, 2022, Sirignano and Spiliopoulos, 2020, Arnaboldi et al., 2023a]. While global convergence is qualitatively well-understood when the network is wide enough, quantitative results are scarcer. A particularly fruitful body of recent theoretical work addressing this gap has focused on deriving precise convergence rates for particular model classes on synthetic data, such as high-dimensional Gaussian single and multi-index models [Ben Arous et al., 2021, Abbe et al., 2022, 2023].

artificial intelligence, machine learning, sie, (17 more...)

Neural Information Processing Systems

Country:

Europe > France (0.46)
North America > United States (0.45)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.25)

Genre: Research Report > Experimental Study (1.00)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

A prism hierarchy of learning regimes in large linear autoencoders

Golikov, Eugene, Gusev, Yaroslav, Yarotsky, Dmitry

arXiv.org Machine LearningJun-5-2026

Theoretical studies of machine learning models commonly consider different limiting regimes in which the learning dynamics of gradient descent becomes theoretically tractable. It is, however, desirable to have a systematically obtained picture of all qualitatively different extreme learning regimes for a particular type of models. In this paper we propose such a picture for large weight-tied linear autoencoders characterized by input and latent dimensions, initialization magnitude, and training set size. This model is nonlinear in the weights and its gradient flow does not have a general theoretical solution. We show that at the level of the formal loss-expansion hierarchy, its extreme regimes are naturally associated with faces of a triangular prism. In particular, there are five basic extreme regimes associated with the 2-faces of the prism: (1) large-data, (2) small-data, (3) mean-field, (4) narrow-latent, and (5) free. For regimes (1,2,3,4), we derive explicit expressions for both train and population limiting loss evolutions under gradient flow, obtaining very good agreement with experimental results.

artificial intelligence, machine learning, regime, (19 more...)

arXiv.org Machine Learning

2606.05335

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

2b763288faedb7707c0748abe015ab6c-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 06:34:00 GMT

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre:

Research Report (0.46)
Instructional Material (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks

Neural Information Processing SystemsApr-25-2026, 06:33:56 GMT

In this paper, we study the generalization properties of Model-Agnostic MetaLearning (MAML) algorithms for supervised learning problems. We focus on the setting in which we train the MAML model over mtasks, each with ndata points, and characterize its generalization error from two points of view: First, we assume the new task at test time is one of the training tasks, and we show that, for strongly convex objective functions, the expected excess population loss is bounded by O(1/mn). Second, we consider the MAML algorithm's generalization to an unseen task and show that the resulting generalization error depends on the total variation distance between the underlying distributions of the new task and the tasks observed during the training process. Our proof techniques rely on the connections between algorithmic stability and generalization bounds of algorithms. In particular, we propose a new definition of stability for meta-learning algorithms, which allows us to capture the role of both the number of tasks mand number of samples per task non the generalization error of MAML.

artificial intelligence, generalization error, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre:

Research Report (0.46)
Instructional Material (0.46)

Industry:

Education (0.48)
Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

A theory of learning data statistics in diffusion models, from easy to hard

Bardone, Lorenzo, Merger, Claudia, Goldt, Sebastian

arXiv.org Machine LearningMar-16-2026

While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a distributional simplicity bias, learning simple, pair-wise input statistics before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely control both pair-wise and higher-order correlations of the inputs. We identify a scalar invariant of the model that governs the sample complexity of learning pair-wise and higher-order correlations that we call the diffusion information exponent, in analogy to related invariants in different learning paradigms. Using this invariant, we prove that the denoiser learns simple, pair-wise statistics of the inputs at linear sample complexity, while more complex higher-order statistics, such as the fourth cumulant, require at least cubic sample complexity. We also prove that the sample complexity of learning the fourth cumulant is linear if pair-wise and higher-order statistics share a correlated latent structure. Our work describes a key mechanism for how diffusion models can learn distributions of increasing complexity.

artificial intelligence, cit, machine learning, (18 more...)

arXiv.org Machine Learning

2603.12901

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

bd2107343c9cc973635d90dbfc122223-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 20:07:21 GMT

artificial intelligence, machine learning, regularization, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.32)

Add feedback

Response to reviewers for the paper: " On Lazy Training in Differentiable Programming "

Neural Information Processing SystemsFeb-13-2026, 14:35:49 GMT

We thank the reviewers for their comments and suggestions. Hereafter, we list reviewers' (sometimes paraphrased) Each answer will translate into a clarification in the final version. Reviewer #2 and #3 felt that our message was lacking clarity. A.2). We will add more pointers to their statistical analysis, from the existing literature (e.g. L81-90 in the main paper, often α(m) = 1/ m in these works).

artificial intelligence, machine learning, reviewer, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback